AITopics | parameter change

Collaborating Authors

parameter change

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules

Vastola, John J., Gershman, Samuel J., Rajan, Kanaka

arXiv.org Artificial IntelligenceNov-3-2025

Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered optimal? We propose a theoretical framework that casts learning rules as policies for navigating (partially observable) loss landscapes, and identifies optimal rules as solutions to an associated optimal control problem. A range of well-known rules emerge naturally within this framework under different assumptions: gradient descent from short-horizon optimization, momentum from longer-horizon planning, natural gradients from accounting for parameter space geometry, non-gradient rules from partial controllability, and adaptive optimizers like Adam from online Bayesian inference of loss landscape shape. We further show that continual learning strategies like weight resetting can be understood as optimal responses to task uncertainty. By unifying these phenomena under a single objective, our framework clarifies the computational structure of learning and offers a principled foundation for designing adaptive algorithms.

artificial intelligence, machine learning, objective, (17 more...)

arXiv.org Artificial Intelligence

2510.26997

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

Towards Robust Influence Functions with Flat Validation Minima

Ye, Xichen, Wu, Yifan, Zhang, Weizhong, Jin, Cheng, Chen, Yifan

arXiv.org Machine LearningMay-27-2025

The Influence Function (IF) is a widely used technique for assessing the impact of individual training samples on model predictions. However, existing IF methods often fail to provide reliable influence estimates in deep neural networks, particularly when applied to noisy training data. This issue does not stem from inaccuracies in parameter change estimation, which has been the primary focus of prior research, but rather from deficiencies in loss change estimation, specifically due to the sharpness of validation risk. In this work, we establish a theoretical connection between influence estimation error, validation set risk, and its sharpness, underscoring the importance of flat validation minima for accurate influence estimation. Furthermore, we introduce a novel estimation form of Influence Function specifically designed for flat validation minima. Experimental results across various tasks validate the superiority of our approach.

artificial intelligence, machine learning, val, (17 more...)

arXiv.org Machine Learning

2505.19097

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs

Yu, Zeping, Ananiadou, Sophia

arXiv.org Artificial IntelligenceMay-23-2025

Although multimodal large language models (MLLMs) have achieved impressive performance, the multimodal instruction tuning stage often causes catastrophic forgetting of the base LLM's language ability, even in strong models like Llama3. To address this, we propose Locate-then-Merge, a training-free parameter fusion framework that first locates important parameters and then selectively merges them. We further introduce Neuron-Fusion, a neuron-level strategy that preserves the influence of neurons with large parameter shifts--neurons likely responsible for newly acquired visual capabilities--while attenuating the influence of neurons with smaller changes that likely encode general-purpose language skills. This design enables better retention of visual adaptation while mitigating language degradation. Experiments on 13 benchmarks across both language and visual tasks show that Neuron-Fusion consistently outperforms existing model merging methods. Further analysis reveals that our method effectively reduces context hallucination in generation.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.16703

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training

Xu, Jie, Wu, Zihan

arXiv.org Machine LearningFeb-15-2025

Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers three key insights: 1) Samples show different time-varying influence patterns, with some samples important in the early training stage while others become important later. 2) Sample influences show a weak correlation between early and late stages, demonstrating that the model undergoes distinct learning phases with shifting priorities. 3) Analyzing influence during the convergence period provides more efficient and accurate detection of corrupted samples than full-training analysis. Supported by theoretical guarantees without assuming loss convexity or model convergence, DIT significantly outperforms existing methods, achieving up to 0.99 correlation with ground truth and above 98\% accuracy in detecting corrupted samples in complex architectures.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2502.10793

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Addressing Delayed Feedback in Conversion Rate Prediction via Influence Functions

Ding, Chenlu, Wu, Jiancan, Yuan, Yancheng, Fang, Junfeng, Li, Cunchun, Wang, Xiang, He, Xiangnan

arXiv.org Artificial IntelligenceFeb-1-2025

In the realm of online digital advertising, conversion rate (CVR) prediction plays a pivotal role in maximizing revenue under cost-per-conversion (CPA) models, where advertisers are charged only when users complete specific actions, such as making a purchase. A major challenge in CVR prediction lies in the delayed feedback problem-conversions may occur hours or even weeks after initial user interactions. This delay complicates model training, as recent data may be incomplete, leading to biases and diminished performance. Although existing methods attempt to address this issue, they often fall short in adapting to evolving user behaviors and depend on auxiliary models, which introduces computational inefficiencies and the risk of model inconsistency. In this work, we propose an Influence Function-empowered framework for Delayed Feedback Modeling (IF-DFM). IF-DFM leverages influence functions to estimate how newly acquired and delayed conversion data impact model parameters, enabling efficient parameter updates without the need for full retraining. Additionally, we present a scalable algorithm that efficiently computes parameter updates by reframing the inverse Hessian-vector product as an optimization problem, striking a balance between computational efficiency and effectiveness. Extensive experiments on benchmark datasets demonstrate that IF-DFM consistently surpasses state-of-the-art methods, significantly enhancing both prediction accuracy and model adaptability.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2502.01669

Country:

Asia > China > Jiangsu Province > Yancheng (0.05)
Asia > China > Anhui Province > Hefei (0.05)
Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry: Marketing (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Do Influence Functions Work on Large Language Models?

Li, Zhe, Zhao, Wei, Li, Yige, Sun, Jun

arXiv.org Artificial IntelligenceDec-19-2024

Influence functions are important for quantifying the impact of individual training data points on a model's predictions. Although extensive research has been conducted on influence functions in traditional machine learning models, their application to large language models (LLMs) has been limited. In this work, we conduct a systematic study to address a key question: do influence functions work on LLMs? Specifically, we evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings. Our further investigation reveals that their poor performance can be attributed to: (1) inevitable approximation errors when estimating the iHVP component due to the scale of LLMs, (2) uncertain convergence during fine-tuning, and, more fundamentally, (3) the definition itself, as changes in model parameters do not necessarily correlate with changes in LLM behavior. Thus, our study suggests the need for alternative approaches for identifying influential samples.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.19998

Country: Asia > Singapore (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Unified Parameter-Efficient Unlearning for LLMs

Ding, Chenlu, Wu, Jiancan, Yuan, Yancheng, Lu, Jinda, Zhang, Kai, Su, Alex, Wang, Xiang, He, Xiangnan

arXiv.org Artificial IntelligenceNov-30-2024

The advent of Large Language Models (LLMs) has revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks. Fine-tuning these models for specific domains, particularly through Parameter-Efficient Fine-Tuning (PEFT) strategies like LoRA, has become a prevalent practice due to its efficiency. However, this raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information. To address these issues, we introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise parameter adjustments using influence functions. Unlike traditional unlearning techniques that are often limited in scope and require extensive retraining, LLMEraser is designed to handle a broad spectrum of unlearning tasks without compromising model performance. Extensive experiments on benchmark datasets demonstrate that LLMEraser excels in efficiently managing various unlearning scenarios while maintaining the overall integrity and efficacy of the models.

large language model, llmeraser, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.00383

Country:

Asia > China > Jiangsu Province > Yancheng (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(4 more...)

Genre:

Research Report (0.82)
Overview (0.67)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Boundary-Decoder network for inverse prediction of capacitor electrostatic analysis

Lim, Kart-Leong, Dutta, Rahul, Rotaru, Mihai

arXiv.org Artificial IntelligenceNov-28-2024

Traditional electrostatic simulation are meshed-based methods which convert partial differential equations into an algebraic system of equations and their solutions are approximated through numerical methods. These methods are time consuming and any changes in their initial or boundary conditions will require solving the numerical problem again. Newer computational methods such as the physics informed neural net (PINN) similarly require re-training when boundary conditions changes. In this work, we propose an end-to-end deep learning approach to model parameter changes to the boundary conditions. The proposed method is demonstrated on the test problem of a long air-filled capacitor structure. The proposed approach is compared to plain vanilla deep learning (NN) and PINN. It is shown that our method can significantly outperform both NN and PINN under dynamic boundary condition as well as retaining its full capability as a forward model.

artificial intelligence, inverse prediction, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.00113

Country:

North America > United States > Utah (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Asia > Singapore (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Statistical Machine Learning Approach for Adapting Reduced-Order Models using Projected Gaussian Process

Liu, Xiao, Liu, Xinchao

arXiv.org Machine LearningOct-17-2024

The Proper Orthogonal Decomposition (POD) computes the optimal basis modes that span a low-dimensional subspace where the Reduced-Order Models (ROMs) reside. Because a governing equation is often parameterized by a set of parameters, challenges immediately arise when one would like to investigate how systems behave differently over the parameter space (in design, control, uncertainty quantification and real-time operations). In this case, the POD basis needs to be updated so as to adapt ROM that accurately captures the variation of a system's behavior over its parameter space. This paper proposes a Projected Gaussian Process (pGP) and formulate the problem of adapting POD basis as a supervised statistical learning problem, for which the goal is to learn a mapping from the parameter space to the Grassmann Manifold that contains the optimal vector subspaces. A mapping is firstly found between the Euclidean space and the horizontal space of an orthogonal matrix that spans a reference subspace in the Grassmann Manifold. Then, a second mapping from the horizontal space to the Grassmann Manifold is established through the Exponential/Logarithm maps between the manifold and its tangent space. Finally, given a new parameter, the conditional distribution of a vector can be found in the Euclidean space using the Gaussian Process (GP) regression, and such a distribution is projected to the Grassmann Manifold that yields the optimal subspace for the new parameter. The proposed statistical learning approach allows us to optimally estimate model parameters given data (i.e., the prediction/interpolation becomes problem-specific), and quantify the uncertainty associated with the prediction. Numerical examples are presented to demonstrate the advantages of the proposed pGP for adapting POD basis against parameter changes.

artificial intelligence, grassmann manifold, machine learning, (17 more...)

arXiv.org Machine Learning

2410.1409

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Exploring Prompt Engineering Practices in the Enterprise

Desmond, Michael, Brachman, Michelle

arXiv.org Artificial IntelligenceMar-13-2024

Interaction with Large Language Models (LLMs) is primarily carried out via prompting. A prompt is a natural language instruction designed to elicit certain behaviour or output from a model. In theory, natural language prompts enable non-experts to interact with and leverage LLMs. However, for complex tasks and tasks with specific requirements, prompt design is not trivial. Creating effective prompts requires skill and knowledge, as well as significant iteration in order to determine model behavior, and guide the model to accomplish a particular goal. We hypothesize that the way in which users iterate on their prompts can provide insight into how they think prompting and models work, as well as the kinds of support needed for more efficient prompt engineering. To better understand prompt engineering practices, we analyzed sessions of prompt editing behavior, categorizing the parts of prompts users iterated on and the types of changes they made. We discuss design implications and future directions based on these prompt engineering practices.

engineering, instruction, use case, (15 more...)

arXiv.org Artificial Intelligence

2403.0895

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback